This lab is a (brief) introduction to Conservation Planning in R, and more generally an introduction to a few ways to visualize and analyze spatial data in R. The key word is introduction: this exercise should give you a taste of what I think is a very powerful and flexible software that can potentially have many applications for Conservation Planning.
Make sure you have downloaded and installed the latest versions of R and Rstudio.
Lots of digital ink has been spilled explaining the advantages (and limitations) of R. For the purposes of this class, I would say that R has a few advantages that make it potentially useful for Conservation Planning:
Right off the bat, I want to point you to some interesting and useful CP and R resources:
vignette. If you want to know what a package or function does, typing ? followed by the function or package name (e.g. ?raster) is always a good place to start!We will be utilizing some data from our previous lab on hotspots in the California Current to produce some threat layers of interest. This will be a two part lab. In the first, we will look at compiling different categories of threats, like the authors did in Halpern et al. (2009). The guiding research questions are:
1. What are the spatial mismatches of local vs. global threats in the CA Current?
2. (for week 2 of the lab) Where were MPAs placed? Which threats do they address?
Giant sea bass Stereolepis gigas, a critically endangered species in southern California
In the first part of the lab, we’ll be working with raster data. As a reminder, raster or gridded data are stored as a grid of values which are rendered on a map as pixels.
Some examples of raster data include oceanographic datasets such as Sea Surface Temperature, land use maps and digital elevation maps, and of course, our California Current threat data.
See a great intro to raster data in R here
Raster data can come in many different formats. In this lab, we us geotiff data which has the extension .tif. A .tif file stores metadata or attributes about the file as embedded tif tags. These tags can include the following raster metadata:
CRS)extent)NoDataValue)resolution of the dataThere are a lot of spatial packages for R, we will touch on some of them here but not all of them. Here is brief overview, taken from this site:
# if need be, you can install the packages you don't have with the command install.packages(),
# with the package names in quotes:
# install.packages("raster","rgdal","rasterVis","maps","rgeos","dplyr","RColorBrewer")
# Load the libraries into this R session
library(raster) #Main raster library with nearly all functions used in this analysis
library(rgdal) #Spatial library - most functions used from rgdal are for vectors (shapefiles)
library(rasterVis) #Useful for raster visualizations
library(maps) #Has a database of maps. I use this to add a map to my raster to visualize land boundaries
library(rgeos) #Need this library for topology operations on geometries
library(dplyr) #NOT spatial - this is a data wrangling library
library(RColorBrewer) #Also not spatial - used to set the spectral color scheme
Now, we should be ready to actually begin looking at data!
For the first part of this lab, we will re-do the hotspot analysis from Week 3, to show how analyses in R are directly comparable to (and in my opinion, easier!) those done in other programs like ArcGIS.
Colors are important for digestible visualization in R, just like in ArcMap or any other software. I don’t particularly like the default color scheme in R, so here we set up our own.
# rainbow color scheme
cols = rev(colorRampPalette(brewer.pal(11, 'Spectral'))(255))
#setting smaller margins for plotting
par(mar=c(2,2,1,1))
We now read some raster data into R. We have to make sure we let R know where to find the data:
threats_dir <- 'E:/TA files/CP2017_Owen/R lab development/Threats_data' # Directory where all my files are
threat_files <- list.files(threats_dir,full.names = T) # List the files in this folder
threat_files # print the file names into the console
Let’s import the raster with the cumulative threat layer. Create a raster of the first file by calling raster() (think of this function like read.csv() for raster data, if you are familiar with that function).
all_threats <- raster(threat_files[2])
We can then use plot() to plot the raster; it’s that easy! To override the default color scheme, we define the argument col= as our own scheme cols from above.
plot(all_threats,col=cols)
We can add a basemap to our raster by utilizing the maps package
# add a landmap to your shapefile. the add=T argument tells R to add it to the existing plot.
# make sure you understand what the other arguments do
plot(all_threats,ext=extent(-130,-110,24,50),col=cols)
map('world',fill=T,add=T,col='gray')
Visualizing exploring rasters in different ways can give us a great idea of our data, without even running much analysis. You can visualize a different extent, or subset, of the data:
plot(all_threats,col=cols,ext=extent(-121,-117,32,35),main="Cumulative Threats") # A good extent for the Santa Barbara Channel
The zoom() function, by default, allows you to draw your own extent just by clicking twice.
plot(all_threats,col=cols)
# zoom(all_threats,col=cols) #Interactive code not run in html
QUESTION: Which part of the SF Bay area is at highest threat?
Beyond visualization, we can also look at some simple characteristics of the data itself. Just calling the name of our raster will give us some information:
all_threats
## class : RasterLayer
## dimensions : 3659, 4407, 16125213 (nrow, ncol, ncell)
## resolution : 0.009448675, 0.009448675 (x, y)
## extent : -138.7553, -97.11496, 21.61105, 56.18375 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## data source : E:\TA files\CP2017_Owen\R lab development\Threats_data\full_modelnv.tif
## names : full_modelnv
## values : 1.027101, 83.84592 (min, max)
QUESTION: What is the minimum value of all raster cells in our threats layer?
We can look at the frequency histogram of raster cell values.
hist(all_threats,main="Cumulative Threats Frequency")
Also, the function cellStats() allows us to run some basic statistics. Type ?cellStats to read what the function can do
cellStats(all_threats,mean)
## [1] 14.00092
QUESTION: What is the standard deviation of all raster cells in our threats layer?
Quickly visualizing raster data in R is nice, but the real power of raster analysis is when we can perform calculations that link two or more raster layers. If you remember from our hotspots lab, our first task was to overlay the top 20% of cumulative threats with the top 20% of species richness, to find threat hotspots. This will require 4 steps:
The species data is in the Species data directory, and we import it just as we did the threats layer, by providing a full path name. We can then check its attributes.
all_spp <- raster("E:/TA files/CP2017_Owen/R lab development/Species_data/ca_curr_sp_rich.tif")
all_spp
plot(all_spp,col=cols)
If you type all_spp and all_threats (or plot them), you should be able to tell that we may run into problems trying to immediately do calculations on them, as they have different extents and resolutions. Two helpful functions to deal with these problems are crop() and resample().
QUESTION: Before doing the next step, which of our two rasters has a greater extent? a higher (finer) resolution? What does this mean about how we should resample?
We first crop the threats layer to the extent of the species layer
#?crop see what the crop function does
threats_crop <- crop(all_threats,all_spp) #Crop the threats layer to the same extent at species
Now the threats layer has the same extent as the species layer. But we have to resample the species layer such that it has the same resolution as the threats layer.
#?resample see what the resample function does
# NOTE: the progress='text' argument is a great tool: it prints out the progress
# of a longer-running function into the console, so you can see how the operation is going
# the method='ngb' argument specifies that we want to use a nearest neighbor algorithm to resample, instead of interpolation
spp_res <- resample(all_spp,threats_crop,method='ngb',progress='text')
##
|
| | 0%
|
|================ | 25%
|
|================================ | 50%
|
|================================================= | 75%
|
|=================================================================| 100%
##
We can check that the two layers will line up decently by using the stack() function. stack() creates a RasterStack object, which is basically just exactly what it sounds like: A single object holding multiple raster layers. This isn’t as useful for just a couple of layers, but it will likely be useful to you later on when you are combining multiple threat layers.
spp_threat_stack <- stack(threats_crop,spp_res)
plot(spp_threat_stack,col=cols)
Even though this layers still look similar to how they did before, the fact that they stacked together means they are likely ready for combination.
Remember, we want to find the top 20% of cells in each layer, and then combine the two layers to produce hotspots. We do this with the reclassify() function from the raster package. Type ?reclassify and make sure you understand what the function does, especially the rcl argument.
First, let’s focus on the species layer.
hist(spp_res,main="Species Raster Values")
This layer has a huge number of zeroes, which, from the graph seem like they should be NAs. We can reassign zero values to NA by calling the reclassify function.
# notice that in the following, we are OVERWRITING the original spp_res object.
# This is okay in this instance since we won't be using the old version, but
# often it is better to assign any output of a function to a new variable or object
spp_res <- reclassify(spp_res,rcl=c(-Inf,0,NA))
hist(spp_res,main="Species Raster Values, Zeroes Removed") # did the function do what we were hoping?
Next, we find the top 20% of the species data, assigning those cells a value of 1, and all the other non-NA cells a value of zero. The quantile() function is very useful for this.
#?quantile what does the quantile function do?
spp_cutoff <- quantile(spp_res,0.8) # Find the value of the 80th percentile
spp_maxVal <- cellStats(spp_res,max) #find the maximum
# Our reclassification matrix. Make sure you know what this is saying
rcl_mat <- c(-Inf,spp_cutoff,0,
spp_cutoff,spp_maxVal,1)
# Reclassify the species layer
spp_binary <- reclassify(spp_res,rcl=rcl_mat)
We can visualize the result of this reclassification just by plotting, as before:
# Because we have binary data now, I want to change the color scheme again
binary_cols <- c("white","firebrick")
plot(spp_binary,col=binary_cols,legend=F,main="Top 20% of Species Richness")
map('world',fill=T,add=T,col='gray')
***
Now we just do a similar process for the threats.
TASK: Reclassify the threats layer (make sure it is the one with the correct extent!) to assign a value of 1 to the top 20% of the cells, and a zero to the others
It should look something like this:
Now, all we have to do is overlay the two layers to find our hotspots. We do this with the overlay() function, which takes two or more rasters and applies a function to them, given by us in the fun argument. In this case, because the two rasters we are combining are just 1s and zeroes, we can just add them together! But remember for later that you can write your own function to combine rasters in other ways.
# the hotspots
hotspots <- overlay(spp_binary,threat_binary,fun=function(x,y){x+y})
# color breakpoints. We need three colors now! (cell values of 0,1,or 2)
brks_hotspots <- seq(0,3,length.out=4)
hotspot_cols <- c("white","lightblue","firebrick") #
# plot the hotspots!
plot(hotspots,col=hotspot_cols,legend=F,main="Hotspots");map('world',fill=T,add=T,col='gray80')
plot(hotspots,col=hotspot_cols,ext=extent(-121,-117,32,35),main="Hotspots, SB Channel",legend=F)
map('world',fill=T,add=T,col='gray80')
Voila! Our hotspots.
This may have seemed like a lot to you. You may have been thinking “R seems much more complicated than just doing this in ArcMap”. You are of course entitled to your opinion, but once you are a little bit familiar with R, I personally believe you will find it much easier and quicker than a lot of the clunkiness that comes along with ArcGIS. For example, here is all of the above code to make hotspots in less than 30 lines of code. I would argue this is MUCH faster than doing this analysis in Arc. And it is fully reproducible, (usually) throws understandable error messages if something goes wrong, and is easy to share with others and adapt to other data or purposes.
Notice that I wrote and included my own function (in this case, a function that takes a raster and a given quantile, and returns a binary raster, assigning ones and zero to those cells above or below a threshold quantile). Writing your own functions is SUPER useful in R, and if you are interested, try writing your own for the next part of this lab.
### import data ###
all_spp <- raster("E:/TA stuff/CP2017_Owen/R lab development/Species_data/ca_curr_sp_rich.tif")
all_threats <- raster("E:/TA files/CP2017_Owen/R lab development/Threats_data/full_modelnv.tif")
#### Crop, resample, and reclassify ###
all_spp <- reclassify(all_spp,rcl=c(-Inf,0,NA)) # reclass 0 to NA
threats_crop <- crop(all_threats,all_spp) # crop threats to species
spp_res <- resample(all_spp,threats_crop,method='ngb') # resample species to threat's resolution
#### Function to output a binary raster based on a user-given quantile (default is top 20%) ###
reclassify_topx <- function(rast,quant=0.8) {
topx <- quantile(rast,quant) #find the 80% quantile of the raster values
maxVal <- cellStats(rast,max) #find the maximum
rcl <- c(-Inf,topx,0,
topx,maxVal,1) # reclassify matrix (see help file for ?reclassify)
out <- reclassify(rast,rcl=rcl)
return(out) # returns the new binary raster
}
### Find top 20%, using the code from above. We could easily choose a different quantile here. ###
all_threats_top20 <- reclassify_topx(threats_crop,quant=0.8)
all_spp_top20 <- reclassify_topx(spp_res,quant=0.8)
### overlay and plot the hotspots ###
hotspots <- overlay(all_threats_top20,all_spp_top20,fun=function(x,y){x+y})
brks_hotspots <- seq(0,3,length.out=4) # color breakpoints
hotspot_cols <- c("white","lightblue","firebrick") # colors
plot(hotspots,col=hotspot_cols);map('world',fill=T,add=T,col='gray80')
The above showed you some of the basics of raster analysis in R. Now it’s your turn to apply these tools to a related spatial analysis problem. If you remember, our guiding question for this week was:
What are the spatial mismatches of local vs. global threats in the CA Current?
Using the table below, choose a set of threat layers (Climate, Fishing, Land-Based, or Other), and:
Download the appropriate set of rasters (see Table) and load them into R.
Stack the layers and plot them (HINT: if you use the stack function, then you can just call plot once and it will plot all of the layers)
Get an idea of the distribution of the data using hist(), cellStats(), or visually by zooming in to different sections of your rasters.
Overlay the rasters and perform raster calculations (reclassify, overlay, maybe calc) to create a cumulative threat layer. You may do this however you deem appropriate. Some suggestions:
There are many ways to perform these analyses. It may be helpful to outline your analysis by hand or with commented lines (##) before you begin, so you have a clear idea of the steps involved. If you run into errors with certain functions, make sure to read the help files first (e.g., ?overlay), which may give you an idea of what went wrong.
| File Name | Description | Threat Category |
|---|---|---|
| impact_acid | Climate change: ocean acidification | Climate |
| impact_uv | Climate change: sea temp. change | Climate |
| impact_sst | Climate change: UV change | Climate |
| impact_rec_fish | Fishing: recreational | Fishing |
| impact_dem_nd_hb | Fishing: demrs. non-des. high bycatch | Fishing |
| impact_dem_nd_lb | Fishing: demrs. non-des. low bycatch | Fishing |
| impact_pel_hb | Fishing: pelagic high bycatch | Fishing |
| impact_pel_lb | Fishing: pelagic low bycatch | Fishing |
| impact_dem_d | Fishing: demersal destructive | Fishing |
| impact_nutrients | Nutrient input | Land-Based |
| impact_sed_decrease | Sediment input: decrease | Land-Based |
| impact_sed_increase | Sediment input: increase | Land-Based |
| impact_organic | Pollution input: organic | Land-Based |
| impact_inorganic | Pollution input: inorganic | Land-Based |
| impact_coastal_e | Coastal engineer.: habitat alteration | Land-Based |
| impact_beach_a | Direct human impact: beach trampling | Land-Based |
| impact_light_pol | Pollution input: light/noise | Land-Based |
| impact_pplants | Power, desalination plants | Land-Based |
| impact_dep_ocean | Pollution input: atmospheric | Land-Based |
| impact_trash | Ocean dumping: marine debris | Other |
| impact_pollution | Ocean pollution (from ships/ports) | Other |
| impact_shipping | Shipping (commercial, cruise, ferry) | Other |
| impact_pens | Aquaculture: finfish (predators) | Other |
| impact_invasives | Invasive species (from ballast, etc.) | Other |
| impact_oil | Benthic structures (e.g. oil rigs) | Other |